Minimum Error Rate Training for Bilingual News Alignment
نویسندگان
چکیده
News articles in different languages on the same event are invaluable for analyzing standpoints and viewpoints in different countries. The major challenge to identify such closely related bilingual news articles is how to take full advantage of various information sources such as length, translation equivalence and publishing date. Accordingly, we propose a discriminative model for bilingual news alignment, which is capable of incorporating arbitrary information sources as features. Chinese word segmentation, Part-of-speech tagging and Named Entity Recognition technologies are used to calculate the semantic similarities between words or text as feature values. The feature weights are optimized using the minimum error rate training algorithm to directly correlate training objective to evaluation metric. Experiments on Chinese-English data show that our method significantly outperforms two strong baseline systems by 12.7% and 2.5%, respectively.
منابع مشابه
The ISI/USC MT system
The ISI/USC machine translation system is a statistical system based on a phrase translation model that is trained on bilingual parallel data. This translation model is combined with several other knowledge sources in a log-linear manner. The weights of the individual components in the log-linear model are set by an automatic parameter-tuning method. The system described here has been developed...
متن کاملImproved Discriminative Bilingual Word Alignment
For many years, statistical machine translation relied on generative models to provide bilingual word alignments. In 2005, several independent efforts showed that discriminative models could be used to enhance or replace the standard generative approach. Building on this work, we demonstrate substantial improvement in word-alignment accuracy, partly though improved training methods, but predomi...
متن کاملSearch for Discriminative Word Alignment via Dual Decomposition
Shiqi Shen, Yang Liu and Maosong Sun (Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing 100084, China) Abstract: Word alignment aims to calculate the corresponding relationship between the words in parallel texts. It has important influence on machine translation, bilingual dictionary construction and many other natu...
متن کامل: Improving Domain-Specific Word Alignment with a General Bilingual Corpus
In conventional word alignment methods, some employ statistical models or statistical measures, which need large-scale bilingual sentencealigned training corpora. Others employ dictionaries to guide alignment selection. However, these methods achieve unsatisfactory alignment results when performing word alignment on a small-scale domain-specific bilingual corpus without terminological lexicons....
متن کاملReducing Parameter Space for Word Alignment
This paper presents the experimental results of our attemps to reduce the size of the parameter space in word alignment algorithm. We use IBM Model 4 as a baseline. In order to reduce the parameter space, we pre-processed the training corpus using a word lemmatizer and a bilingual term extraction algorithm. Using these additional components, we obtained an improvement in the alignment error rate.
متن کامل